57 research outputs found

    Data Collections Explorer – An Information System for the Engineering Sciences

    Get PDF
    The poster provides an overview of and an introduction to the Data Collections Explorer. It is an information system for engineering related resources, such as repositories, databases, archives, as well as datasets published independent of these structures. In addition, the poster describes current developments and provides an outlook on future work. The target audience are scientists either searching for data sets or looking to publish their data sets. Seeking to share or publish their research data sets, scientists are confronted with questions such as: what is an appropriate repository for publishing the data sets? Do repositories have size restrictions with respect to the data sets that can be uploaded? What is this limit? Are publication fees charged? Searching for data sets, scientists are often confronted with questions such as: are data sets available to aid in research? Are benchmarks available to check results? Are these data sets available under an open access license? The Data Collections Explorer is a low-threshold web service to share and discover research data. To help get an overview and answer the above questions, it provides a free-text search as well as drop-down menus to filter its entries for the following criteria: ‱ Hosting Institution ‱ Type of Service ‱ Subject Area ‱ Open Access A stable version is currently in use by NFDI4Ing (https://data-collections.nfdi4ing.de). The current development version features a graph-based data model. A preview version will be made available later this year. This knowledge graph can be accessed by a SPARQL interface. It provides more flexibility than the currently deployed version

    Data Life Cycle Lab. Key Technologies. Status 2013. Big Data in Science

    Get PDF

    Neural Classifier Systems for Histopathologic Diagnosis

    Full text link
    Neural network and statistical classification methods were applied to derive an objective grading for moderately and poorly differentiated lesions, based on characteristics of the nuclear placement patterns. Using a multilayer network after abbreviated training as a feature extractor followed by a quadratic Bayesian classifier allowed grade assignment agreeing with visual diagnostic consensus in 96% of fields from the training set of 500 fields, and a 77% of 130 fields of a test set

    FAIR Digital Object Concept for Composing Machine Learning Training Data

    Get PDF
    In this poster we introduce how the FAIRÂč Digital Object (FAIR DO) concept can simplify the composition of training data sets for Machine Learning (ML). Training data sets from heterogeneous sources mostly have different label terms. Therefore, composing them for application in ML requires laborious relabeling. To automate this process, the FAIR DO concept can be applied. A FAIR DO is an informative representation of scientific data, e.g. a training data set, that makes the data interpretable and actionable for computer systems. For applicability in the context of ML a FAIR DO requires at least a globally unique Persistent Identifier (PID), mandatory metadata, and a data type. With the self-contained structure of a FAIR DO, the associated label information can be accessed. Here, we show this structure and explain how it facilitates access to label information. Moreover, specialized clients and tools are needed for fully automated acting on FAIR DOs and relabeling. Using FAIR DOs this way could also address other laborious steps in ML training data composition like feature- or file reformatting. The described work is based on the results of the RDA IG FAIR Digital Object FabricÂČ. The outcome contributes to the contents of the RDA IG FAIR for MLÂł. This work has been supported by the research program ‘Engineering Digital Futures’ of the Helmholtz Association of German Research Centers and the Helmholtz Metadata Collaboration Platform⁎. (1) FAIR Guiding Principles of scientific data to improve Findability, Accessibility, Interoperability and Reuse of digital assets https://www.go-fair.org/fair-principles/ (2) FAIR Digital Object Fabric IG https://www.rd-alliance.org/group/FAIR-digital-object-fabric-ig.html (3) FAIR for ML https://www.rd-alliance.org/defining-fair-machine-learning-ml (4) Helmholtz Metadata Collaboration (HMC) Platform https://www.hmc-plattform.org/e

    Seminar Big Data Applications. Sammelband. Sommersemester 2013

    Get PDF

    FAIR Digital Object Ecosystem Testbed

    Get PDF
    This poster describes the development of a testbed for FAIR Digital Objects, consisting of an ecosystem of interacting services to demonstrate mandatory and optional FAIR use cases and to identify gaps in the specifications. Preprocessing data for research, like finding, accessing, unifying or converting, takes up to 80% of research time spans. The FAIR (Findability, Accessibility, Interoperability, Reproducibility) principles aim to support and facilitate the reuse of data, and are therefore tackling this problem. A FAIR Digital Object (FAIR DO) is one way to capsule research data resources of all kinds (raw data, metadata, software, ...) so they are following the FAIR principles. A FAIR DO ecosystem can be regarded as a set of services to enable the creation and use of such FAIR DOs. Besides basic functionality like PID management and PID record validation, it may also offer assistive services, i.e. for the automated building of a search index to allow reverse-searching of PIDs. To establish a FAIR DO ecosystem, it must reach a certain level of maturity in order to be used productively. To identify gaps in specifications and concepts during development and use, as well as to demonstrate necessary and optional use cases, we developed a testbed for FAIR use cases as a part of the HMC project, which is easy to set up and run on everyday computers. Currently, the testbed enables PID record management and validation using a PIT service implementation following the RDA PID Information Types (PIT) Working Group Recommendations and an externally hosted Data Type Registry following the RDA Data Type Registry Working Group Recommendations . It also features automated indexing of PID records (proof-of-concept), and provides an implementation of the Collection API specification, which was published by the corresponding RDA Research Data Collections Working Group . The most important gap identified, and the most difficult to close, is the design and specification of the profiles that determine the content of PID records. Software in use by researchers will need to use these contents to determine whether and to what extent it can use a FAIR Digital Object, so this gap needs to be strongly considered.The testbed development has been supported by the research program ‘Engineering Digital futures’ of the Helmholtz Association of German Research Centers and the Helmholtz Metadata Collaboration Platform

    FAIR Digital Object for Accessing Label Information of ML Training Data Stored in a Metadata Schema

    Get PDF
    In this poster, we introduce how the FAIRÂč Digital Object (FAIR DO) concept can simplify the access to schema-based label information of Machine Learning (ML) training data. Training data sets from heterogeneous sources mostly have different label terms. Therefore, composing them for application in ML comes with the cost of laborious relabeling. To ease this process by automation, the FAIR DO concept can be applied. A FAIR DO is an informative representation of scientific data, e.g. an ML training data set, that makes the data interpretable and actionable for computer systems. For applicability in the context of ML, a FAIR DO requires at least a globally unique Persistent Identifier (PID), mandatory metadata, and a data type. Label information of a training data set can be described using a proper metadata schema. With the self-contained structure of a FAIR DO, the associated label information can be accessed. Here we show this structure and explain how it facilitates access to label information. The latter is stored in a document that is based on a custom metadata schema built for ML. The schema provides a structure for basic label description and assignment of label terms to the elements in the training data set. With this, we point out the advantage of describing label information with a metadata schema in conjunction with the FAIR DO structure. Moreover, specialized clients and tools are needed for fully automated acting on FAIR DOs and relabeling. Extending the metadata schema with controlled vocabularies and ontologies is required for a more consistent data description. Using FAIR DOs this way could also address other laborious steps in ML training data composition like feature- or file reformatting. This work has been supported by the research program ‘Engineering Digital Futures’ of the Helmholtz Association of German Research Centers and the Helmholtz Metadata Collaboration Platform. [1] Mark D. Wilkinson et al., Scientific Data, 3, 160018 (2016)

    FAIR Digital Object Application Case for Composing Machine Learning Training Data

    Get PDF
    The application case for implementing and using the FAIR Digital Object (FAIR DO) concept aims to simplify the access to label information for composing Machine Learning (ML) training data
    • 

    corecore